DM825 - Introduction to Machine Learning
Sheet 11, Spring 2013 [pdf format]
|
Exercise 1 – Probability theory
Prove the following rule:
where x−i={x1,…,xN} ∖ xi.
Exercise 2 – Naive Bayes
Consider the binary classification problem of spam email in which a
binary label Y ∈ {0, 1} is to be predicted from a feature vector
X = (X1, X2, …, Xn), where Xi=1 if the word i is present
in the email and 0 otherwise. Consider a naive Bayes model, in which
the components Xi are assumed mutually conditionally independent
given the class label Y.
- [a] Draw a directed graphical model corresponding to the naive
Bayes model.
- Find a mathematical expression for the posterior
class probability p(Y = 1 | x), in terms of the prior class
probability p(Y = 1) and the class-conditional densities p(xi |
y).
- Make now explicit the hyperparameters of the Bernoulli
distributions for Y and Xi. Call them, µ and θi,
respectively. Assume a beta distribution for the prior of these
hyperparameters and show how to learn the hyperparameters from a set
of training data (yj,x→j)j=1m using a Bayesian
approach. Compare this solution with the one developed in class via
maximum likelihood.
Exercise 3 – Directed Graphical Models
Consider the graph in Figure left.
- Write down the standard factorization for the given
graph.
- For what pairs (i, j) does the statement Xi is independent of
Xj hold? (Don’t assume any conditioning in this part.)
- Suppose that we condition on {X2, X9}, shown shaded in the
graph. What is the largest set A for which the statement X1 is
conditionally independent of XA given {X2, X9} holds?
- What is the largest set B for which X8 is conditionally
independent of XB given {X2, X9} holds?
- Suppose that I wanted to draw a sample from the marginal
distribution p(x5) = Pr[X5 = x5]. (Don’t assume that X2 and
X9 are observed.) Describe an efficient algorithm to do so without
actually computing the marginal.
Figure 1: A directed graph. |